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10 " METHOD FOR DETERMINING THE FORMAT OF SEISMIC 

TRACE DATA" 

The present invention relates to a method for 
15 identifying the formal of seismic trace data and, more 
particularly, to such a method which uses internal logic 
rules, such as provided by artificial intelligence/ to 
identify the format, 

20 The successful processing of data is highly 

dependent upon the particular format in which the data has 
been recorded. For example, companies that process 
seismic traces acquire the data from many different ser- 
vice companies, each of which records the data in their 

25 own particular format (s). In order for the processing to 
be successfully accomplished, the data needs to be placed 
into a format that is compatible with the particular com- 
puter programs that are to be used to process the data, 
for example programs to increase the signal-to-noise 

30 ratio. Various data recording standards have been pub- 
lished by organizations? however, these standards have not 
been widely adhered to so seismic data processing includes 
the need for an initial determination of the format of the 
data that has been received from an outside source. 

35 Most companies that process data have developed 

or acquired a suite of data processing programs that pro- 
cess the data in the many different formats, i.e., a 
single processing program for each format or set of 
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related formats. Again, the problem is to select the 
appropriate processing program for the particular format, 
as" well as the determination of certain peculiar pro- 
cessing parameters that are to be used in the processing. 
5 Therefore, there is a need for a simple method of deter- 
mining the format of the data. 

The present invention has been contemplated to 
overcome the forgoing deficiencies and to meet the above- 
10 described need. The present invention provides a method 
of identifying, from a list of known data formats, a par- 
ticular format for a set of data. In the method, a repre- 
sentation is created of at least a portion of the data, 
and from that representation, characteristics of the data 
15 are obtained. Utilizing predetermined logic rules, sup- 
plied by experienced users and translated into a form used 
by an expert system shell, the data characteristics are 
matched to known data characteristics for each known daca 
format until a match is accomplished. Expert system 
20 shells are programs written for programmable digital com- 
puters that manipulate symbols in predefined ways. In 
particular, they allow the backward chaining of the 
detailed description. Thereafter, a report is generated 
indicating that the data is in a particular matched data 
25 format. By using this program, a set of data can be 

easily and quickly reviewed and an indication of the par- 
ticular data format is provided, as well as any particular 
additional known parameters that are required in the pro- 
cessing of the data. 
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The present invention provides a method of iden- 
tifying, from a list of known data formats, a particular 
format for a set of data by using a programmable digital 
computer. While this method can be utilized for any data 
35 format, for the purpose of this discussion, the field of 
use will be identified as processing seismic data used in 
the exploration for oil and gas. Basically, once a 
seismic tape containing data is acquired from an outside 




source, it is first loaded on a computer system and a 
representation of the content of the data is created. 
This representation is a uniform formatted representation, 
i.e., a formatted hexidecimal dump of the data. An 
5 example of a representation is shown in Table I. 

The representation is electronically transmitted 
to the computer where programs of the present invention 
reside. When the user invokes the program, the name of 
the file containing the representation must be supplied to 
10 the program and other particular information (described 
herein below) about the file that might be available may 
be supplied- With the available information, the program 
then scans the first portion of the representation to 
identify the location and value of particular data items 
15 to obtain characteristics of the data. Then, the program 
identifies the particular data format, the name of the 
particular processing program useful in processing data in 
that format and the values of any parameters that might be 
needed during such processing. 
20 Specifically, before the program of the present 

invention can be run, it is necessary to transform the 
data set into a uniformally formatted representation. The 
first portion, for example 480 bytes, of each physical 
record of the tape are generated in hexadecimal form. 
25 Each record is then represented by a line that contains 
the number of bytes in the physical record and the number 
of the physical record, followed by 12 lines containing 
the hexadecimal representation of the data. The hexade- 
cimal representation is arranged in groups of four nibbles 
30 separated by a single space.- 

Once this has been transferred to the machine 
where the program runs, the following information is asked 
of the user: 

1. The name of the file containing the 

35 input data. 

2. Whether observers' notes are available 

for this data, and if yes: 
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(a) the number of traces per seismic 

record. 

(b) the number of auxiliary traces per 
seismic record. 

5 (c) the sample interval used for 

recording the data. 

(d) the length of the trace data in 

seconds . 

3. Whether a hardcopy of the representa- 
10 tion is available to user and if so the format used 

to record the trace data. 

4. Whether the data was initially recorded 

in analog or digital form. 
The representation of the data is then analyzed to obtain 
15 data characteristics, for example, the program will scan 
the data representation at particular locations and 
retrieve the values at such locations. These characteris- 
tics are thereafter used in the matching process with 
logic rules. The logic rules utilized in the present 
20 invention are a set of knowledge relationships captured 
from experienced people and written using the Personal 
Consultant Plus©, a computer program marketed by Texas 
Instruments Corporation. The program accomplishes its 
determination by the application of the set of rules that 
25 have been determined to solve this problem correctly about 
80-85% of the time. Each of the rules is an independent 
piece of information that is known to experienced people 
who can make a parameter determination. The Personal Con- 
sultant Plus© determines the manner and order by which 
30 these rules are used in any particular run of the program. 

Some of the logical rules can be classified as 
facts about the problem of program parameter determina- 
tion. Some of the- rules can be classified as generally 
disseminated practices about the problem of recognizing 
35 seismic formats. A third classification of rules is that 
one developed after long discussions with experienced 
people who are familiar with the program and parameter 
determination and truly reflects their expertise in per- 
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forming these tasks. Table II shows examples of these 
roles. 

One of the strengths of this program is that 
there can be any number of rules that have been accumu- 
5 lated and coded into the system, thereby permitting a high 
degree of confidence in the outputted format determina- 
te predetermined logic rules developed are 
.polled against the characteristics of the data by a logic 
10 process called backward chaining. In other words, a firs, 
or trial data format is chosen from the list of known aata 
formats and set as a goal. For such data to be in the ^ 
f^st data format, one or several rules must be activatec, 
i e , have all of their premises be proved true. The 
15 reauired rules are found and their premises are examinee. 
Tf "the facts needed to determine the truth of the premises 
Ire not known, then these facts are set as subgoals, anc 
the cycle of selecting rules occurs again. If a suf-i 
cient set of rules is able to activate, there is a matcn 
20 for the rules, then the data is in the first data format. 
However, if the rules do not activate, then a new tna. . 
data format is selected as a goal, the respective logic 
rules are found, and the backward chaining of ru.es and 
facts is aoolied as before. The program will continue^ 
25 with each known data format until a match is founc. I, no 
match is found, then the program will indicate that no 

match was found. 

When the program has determined to the predeter- 
mined satisfaction limit the particular format of the 
30 data, an indication or display is provided to the user of 
the name of the preferred processing program and any asso- 
ciated processing variables, needed to process the data. 
Six examples of the indication are provided in Table II*. 

Wherein the present invention has been described 
35 in particular relation to the examples included herein, it 
should be understood that other and further modifications, 
aoart from those shown or suggested herein, may be mace 
within the scope and spirit of the present invention. 
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TABLE II 
Rule069 STD-RULES/antecedent 

If 1) the sample interval used to record the data in micro- 
seconds is known, and 

2) the measure of certainty associated with the sample 
interval used to record the data in microseconds, 

Then, it is definite (100%) that the sample inter val used to 
record the data in milliseconds is the sample interval used 
-o record the data in microseconds divided by 1000. 

:?: SI IS KNOWN AND CERTAINTY SI 
TEEN: SI-MILLI - VALUE SI / 1000 

PREMISE: ( $AND 

(KNOWN FRAME SI) 

( MEASURE 1 FRAME SI)) 

ACT ION : ( DO-ALL 

(CONCLUDE FRAME SI-MILLI 
(FQUOTIENT 

(VAL1 FRAM SI) 1000) TALLY 100)) 

ANTECEDENT : YES 
Rule074 STD-RULES 

If 1) both 1*4 and R*4 are equally likely, and 

2) the measure of certainty associated with the format 
the recorded data, 

Then there is weakly suggestive evidence (20%) that: the 
format of the recorded data is R*4. 
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IF: SEL-DFMT (VAL FRAME DFMT) AND CERTAINTY DFMT 
THEN: DFMT = R*4 CF 20 

PREMISE: ( $AND 

( SEL-DFMT 

(VAL FRAME DFMT)) 
( MEASURE1 FRAME DFMT ) ) 

ACT I ON : ( DO-ALL 

(CONCLUDE FRAME DFMT R*4 TALLY 20)) 

Rule072 STD— RULES 

If the number of line headers is 0, 

Then, 1) there is strongly suggestive evidence (80%) that the 
first 32 words of the binary header is all zeros, 
and 

2) there is strongly suggestive evidence (80%) that the 
tape is a variant of the SEG-Y format. 

IF: NO-LH-1 = 0 

THEN : VALUE-3H = GET-0-BH CF 80 AND A-SEGY CF 80 

PREMISE: ( SAND 

(SAMS FRAME NO-LH-1 0)) 

ACTION: (DO-ALL 

(CONCLUDE FRAME VALUE— 3H 

(GET-0-EH) TALLY 80) 
(CONCLUDE FRAME A-SEGY YES TALLY 80)) 



s. 
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TABLE III 

Example 1. The system found that there were only seven sec- 
onds of data in the file even though the user had indicated 
that there were eight. 

MPS-STD-1 CONCLUSIONS : 

A major recommendation is as follows: Use the program EXCH 
to reformat the file, fully accounting for all of the subsic 
iary recommendations. (74%) 

The complete list of parameters for the selected program is 
as follows: 



Sample interval 






2 


97% 


Number of regular traces 






16 


87% 


Number of auxiliary traces 






0 


87% 


Number of samples per trace 






3500 


74% 


Length of the trace header 






240 


74% 


Number of line headers 






2 


90% 


Data format 






R*4 


100% 


Record number position 


bytes 


11 


and 12 


100% 


Trace number position 


bytes 


15 


and 16 


100% 



A subsidiary recommendation is as follows: -It cppears 
the recording has been shortened to 7 seconds. (80%) 
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Example 2. The system notes that the record numbers are 
large at the beginning of the file and might exceed the pro- 
gram capability by the time the end of the file is reached. 

MPS-STD-1 CONCLUSIONS: 

A major recommendation is as follows: Use the program EXCK 
to reformat the file, fully accounting for all of the subsid- 
iary recommendations. (83%) 

The complete list of parameters for the selected program is 
as follows : 



Sample interval 






2 


97% 


Number of regular traces 






48 


88% 


Number of auxiliary traces 






0 


99% 


Number of samples per trace 






3500 


83% 


Length of the trace header 






240 


83% 


Number of line headers 






2 


90% 


Data format 






R*4 


100% 


Record number position 


bytes 


11 


and 12 


100% 


Trace number position 


bytes 


15 


and 16 


100% 



A subsidiary recommendation is as follows: The data contains 
record numbers that are greater than 15,000. The file can be 
reformatted, but the records should be renumbered (72%) 



# 
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Example 3- The system notes that there is an additional 
tenth of a second of data in the file. 

MPS-STD-1 CONCLUSIONS: 

A major recommendation is as follows: Use the program EXCH 
to reformat the file, fully accounting for all of the subsid- 
iary recommendations. (40%) 

The complete list of parameters for the selected program is 
as follows: 



Sample interval 






4 


95% 


Number of regular traces 






2.4 


40% 


Number of auxiliary traces 






0 


53% 


Number of samples per trace 






1175 


65% 


Length cf the trace header 






240 


65% 


Number of line headers 






2 


90% 


Data format 






R*4 


100% 


Record number position 


bytes 


221 


and 222 


80% 


Trace number position 


bytes 


223 


and 224 


80% 



A subsidiary recommendation is as follows: It appears thaz 
there are actually 4.7 seconds of data, rather than the 4.6 
seconds indicated in the observer notes. This file appears 
to be in CDP sort sequence. (72%) 

Example 4. For the case of a SEGD formatted file, addition 
parameters are not required to process the file. 

M-STD-1 CONCLUSIONS: 

A major recommendation is as follows: The program SEGD 
should be used to reformat the file because it is in SEG-D 
format. 
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Example 5. Here two problems were found. The record numbers 
were not recorded and the traces have been shortened. 

MPS-STD-1 CONCLUSIONS: 

A major recommendation is as follows: Use the program EXCE 
to reformat the file, fully accounting for all of the subsid- 
iary recommendations. (43%) 

The complete list of parameters for the selected program is 
as follows: 



Sample interval 


2 


97% 


Number of regular traces 


48 


54% 


Number of auxiliary traces 


0 


86% 


Number of samples per trace 


1000 


94% 


Length of the trace header 


240 


94% 


Number of line headers 


2 


90% 


Data format 


R*4 


100% 


Record number position 


unknown 


43% 


Trace number position 


bytes 15 and 16 


54% 



A subsidiary recommendation is as follows: It appears that 
the recording has been shortened to 2 seconds. (80%) There 
are no record numbers in the trace headers. The file can be 
processed, but the record numbers will need to be generated 
by renumbering. (43%) 
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Example 6* There may be a problem with the interpretation of 
this case because the user didn't dump enough of the seismic 
data file as input to the program. 

MPS-STD-1 CONCLUSIONS: 

A major recommendation is as follows: Use the program EXCH 
to reformat the file, fully accounting for all of the subsid- 
iary recommendations. (43%) 

The complete list of parameters for the selected program is 
as follows : 



Sample interval 


2 


97% 


Number cf regular traces 


18 


54% 


Number cf auxiliary traces 


0 


86% 


Number of samples per trace 


3000 


9 2% 


Length of the trace header 


240 


92% 


Number of line headers 


2 


90% 


Data format 


R*4 


93% 


Record number position 


unknown 


43% 


Trace number position bytes 


15 and 16 


54% 



A subsidiary recommendation is as follows: The number of 
traces MAY be limited by the number of records that have been 
dumped, you ought to dump more records and re-run this con- 
sultation. Both the binary header and the observer no:es 
indicate that there are 48 traces. 



• 
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CLAIMS 

1. A method of identifying, from a list of 
known data formats, the particular format for a set of 

data, comorising: 
5 (a) creating a representation o£ at least a 

portion of the data; 

(b) from the representation, obtaining 

characteristics of the data; 

(c) utilizing predetermined logic rules, 
10 matching the data characteristics of (b) to known 

data characteristics for the known data formats until 
a match is accomplished; and 

(d) generating an indication that the data 
is in the matched data format . 

15 2 . The method of Claim 1 wherein the represen- 

tation of the data is uniformly formatted. 

3. The method of Claim 2 wherein the uniformly 
formatted data is in hexidecimal form. 

4. The method of Claim 1 wherein step (b) com- 
20 prises analyzing a first portion of the data to identify 

the location and value of particular data items. 

5. The method of Claim 1 wherein before step 
(c), including the step of inputting user-known data char- 
acteristics . 

25 6 . The method. of Claim 1 wherein step (d) 

includes utilizing known data processing requirements for 
each known data format, generating an indication of pre- 
ferred processing variables. 

7. The method of Claim 1 wherein the set of 

30 data are seismic data traces. 
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